Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add filters

Main subject
Language
Document Type
Year range
1.
medrxiv; 2022.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2022.11.14.22282297

ABSTRACT

The continuing emergence of SARS-CoV-2 variants of concern (VOCs) presents a serious public health threat, exacerbating the effects of the COVID19 pandemic. Although millions of genomes have been deposited in public archives since the start of the pandemic, predicting SARS-CoV-2 clinical characteristics from the genome sequence remains challenging. In this study, we used a collection of over 29,000 high quality SARS-CoV-2 genomes to build machine learning models for predicting clinical detection cycle threshold (Ct) values, which correspond with viral load. After evaluating several machine learning methods and parameters, our best model was a random forest regressor that used 10-mer oligonucleotides as features and achieved an R2 score of 0.521 +/- 0.010 (95% confidence interval over 5 folds) and an RMSE of 5.7 +/- 0.034, demonstrating the ability of the models to detect the presence of a signal in the genomic data. In an attempt to predict Ct values for newly emerging variants, we predicted Ct values for Omicron variants using models trained on previous variants. We found that approximately 5% of the data in the model needed to be from the new variant in order to learn its Ct values. Finally, to understand how the model is working, we evaluated the top features and found that the model is using a multitude of k-mers from across the genome to make the predictions. However, when we looked at the top k-mers that occurred most frequently across the set of genomes, we observed a clustering of k-mers that span spike protein regions corresponding with key variations that are hallmarks of the VOCs including G339, K417, L452, N501, and P681, indicating that these sites are informative in the model and may impact the Ct values that are observed in clinical samples.


Subject(s)
COVID-19
2.
medrxiv; 2021.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2021.05.20.21257552

ABSTRACT

Genetic variants of the SARS-CoV-2 virus are of substantial concern because they can detrimentally alter the pandemic course and disease features in individual patients. Here we report SARS-CoV-2 genome sequences from 12,476 patients in the Houston Methodist healthcare system diagnosed from January 1, 2021 through May 31, 2021. The SARS-CoV-2 variant designated U.K. B.1.1.7 increased rapidly and caused 63%-90% of all new cases in the Houston area in the latter half of May. Eleven of the 3,276 B.1.1.7 genomes had an E484K change in spike protein. Compared with non-B.1.1.7 patients, individuals with B.1.1.7 had a significantly lower cycle threshold value (a proxy for higher virus load) and significantly higher rate of hospitalization. Other variants (e.g., B.1.429, B.1.427, P.1, P.2, and R.1) also increased rapidly, although the magnitude was less than for B.1.1.7. We identified 22 patients infected with B.1.617 "India" variants; these patients had a high rate of hospitalization. Vaccine breakthrough cases (n=207) were caused by a heterogeneous array of virus genotypes, including many that are not variants of interest or concern. In the aggregate, our study delineates the trajectory of concerning SARS-CoV-2 variants circulating in a major metropolitan area, documents B.1.1.7 as the major cause of new cases in Houston, and heralds the arrival and spread of B.1.617 variants in the metroplex.

3.
medrxiv; 2020.
Preprint in English | medRxiv | ID: ppzbmed-10.1101.2020.09.22.20199125

ABSTRACT

We sequenced the genomes of 5,085 SARS-CoV-2 strains causing two COVID-19 disease waves in metropolitan Houston, Texas, an ethnically diverse region with seven million residents. The genomes were from viruses recovered in the earliest recognized phase of the pandemic in Houston, and an ongoing massive second wave of infections. The virus was originally introduced into Houston many times independently. Virtually all strains in the second wave have a Gly614 amino acid replacement in the spike protein, a polymorphism that has been linked to increased transmission and infectivity. Patients infected with the Gly614 variant strains had significantly higher virus loads in the nasopharynx on initial diagnosis. We found little evidence of a significant relationship between virus genotypes and altered virulence, stressing the linkage between disease severity, underlying medical conditions, and host genetics. Some regions of the spike protein - the primary target of global vaccine efforts - are replete with amino acid replacements, perhaps indicating the action of selection. We exploited the genomic data to generate defined single amino acid replacements in the receptor binding domain of spike protein that, importantly, produced decreased recognition by the neutralizing monoclonal antibody CR30022. Our study is the first analysis of the molecular architecture of SARS-CoV-2 in two infection waves in a major metropolitan region. The findings will help us to understand the origin, composition, and trajectory of future infection waves, and the potential effect of the host immune response and therapeutic maneuvers on SARS-CoV-2 evolution. IMPORTANCEThere is concern about second and subsequent waves of COVID-19 caused by the SARS-CoV-2 coronavirus occurring in communities globally that had an initial disease wave. Metropolitan Houston, Texas, with a population of 7 million, is experiencing a massive second disease wave that began in late May 2020. To understand SARS-CoV-2 molecular population genomic architecture, evolution, and relationship between virus genotypes and patient features, we sequenced the genomes of 5,085 SARS-CoV-2 strains from these two waves. Our study provides the first molecular characterization of SARS-CoV-2 strains causing two distinct COVID-19 disease waves.


Subject(s)
COVID-19
4.
biorxiv; 2020.
Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2020.05.01.072652

ABSTRACT

We sequenced the genomes of 320 SARS-CoV-2 strains from COVID-19 patients in metropolitan Houston, Texas, an ethnically diverse region with seven million residents. These genomes were from the viruses causing infections in the earliest recognized phase of the pandemic affecting Houston. Substantial viral genomic diversity was identified, which we interpret to mean that the virus was introduced into Houston many times independently by individuals who had traveled from different parts of the country and the world. The majority of viruses are apparent progeny of strains derived from Europe and Asia. We found no significant evidence of more virulent viral types, stressing the linkage between severe disease, underlying medical conditions, and perhaps host genetics. We discovered a signal of selection acting on the spike protein, the primary target of massive vaccine efforts worldwide. The data provide a critical resource for assessing virus evolution, the origin of new outbreaks, and the effect of host immune response. SignificanceCOVID-19, the disease caused by the SARS-CoV-2 virus, is a global pandemic. To better understand the first phase of virus spread in metropolitan Houston, Texas, we sequenced the genomes of 320 SARS-CoV-2 strains recovered from COVID-19 patients early in the Houston viral arc. We identified no evidence that a particular strain or its progeny causes more severe disease, underscoring the connection between severe disease, underlying health conditions, and host genetics. Some amino acid replacements in the spike protein suggest positive immune selection is at work in shaping variation in this protein. Our analysis traces the early molecular architecture of SARS-CoV-2 in Houston, and will help us to understand the origin and trajectory of future infection spikes.


Subject(s)
COVID-19
SELECTION OF CITATIONS
SEARCH DETAIL